Goto

Collaborating Authors

 nullw null




12151_differentially_private_general.pdf

Neural Information Processing Systems

A.3 Low Dimension Before presenting the proof of Theorem 1, we provide formal statements of its Corollaries. We then bound average argument stability in terms of average regret (Lemma 5). Substituting these in the above equation gives the claimed bound. We now fill in the details. Thus, substituting the above in Eqn. ( 3) and substituting the bound from 6, we have, E [ L ( null w; D) L ( w Substituting the value of G completes the proof.


APPENDIX: In this section, we provide the details of our implementation and proofs for reproducibility

Neural Information Processing Systems

's hidden state by h Then we need to calculate the second part of Eq. Using the Bayes' theorem, we have: p In Section 4.3, we devise a Sigmoid function to adapt the γ during the supernet training, which is defined as: γ (t) = 1 Sigmoidnull ( t total epochs 2 1) b null, (19) Section 3.2 theoretically demonstrates the benefit of the proposed architecture complementation loss function,


Magnitude and Angle Dynamics in Training Single ReLU Neurons

Lee, Sangmin, Sim, Byeongsu, Ye, Jong Chul

arXiv.org Artificial Intelligence

To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $\phi(t):= \pi - \theta(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describe the dynamics of gradient flow. Using the obtained bounds, we conclude that small scale initialization induces slow convergence speed for deep single ReLU neurons. Finally, by exploiting the relation of gradient flow and gradient descent, we extend our results to the gradient descent approach. All theoretical results are verified by experiments.


Active Learning for Identification of Linear Dynamical Systems

Wagenmaker, Andrew, Jamieson, Kevin

arXiv.org Machine Learning

We propose an algorithm to actively estimate the parameters of a linear dynamical system. Given complete control over the system's input, our algorithm adaptively chooses the inputs to accelerate estimation. We show a finite time bound quantifying the estimation rate our algorithm attains and prove matching upper and lower bounds which guarantee its asymptotic optimality, up to constants. In addition, we show that this optimal rate is unattainable when using Gaussian noise to excite the system, even with optimally tuned covariance, and analyze several examples where our algorithm provably improves over rates obtained by playing noise. Our analysis critically relies on a novel result quantifying the error in estimating the parameters of a dynamical system when arbitrary periodic inputs are being played. We conclude with numerical examples that illustrate the effectiveness of our algorithm in practice.


Neural tangent kernels, transportation mappings, and universal approximation

Ji, Ziwei, Telgarsky, Matus, Xian, Ruicheng

arXiv.org Machine Learning

This paper establishes rates of universal approximation for the shallow neural tangent kernel (NTK): network weights are only allowed microscopic changes from random initialization, which entails that activations are mostly unchanged, and the network is nearly equivalent to its linearization. Concretely, the paper has two main contributions: a generic scheme to approximate functions with the NTK by sampling from transport mappings between the initial weights and their desired values, and the construction of transport mappings via Fourier transforms. Regarding the first contribution, the proof scheme provides another perspective on how the NTK regime arises from rescaling: redundancy in the weights due to resampling allows individual weights to be scaled down. Regarding the second contribution, the most notable transport mapping asserts that roughly $1 / \delta^{10d}$ nodes are sufficient to approximate continuous functions, where $\delta$ depends on the continuity properties of the target function. By contrast, nearly the same proof yields a bound of $1 / \delta^{2d}$ for shallow ReLU networks; this gap suggests a tantalizing direction for future work, separating shallow ReLU networks and their linearization.


Robust stability of moving horizon estimation for nonlinear systems with bounded disturbances using adaptive arrival cost

Deniz, Nestor N., Murillo, Marina H., Sanchez, Guido, Genzelis, Lucas M., Giovanini, Leonardo

arXiv.org Artificial Intelligence

In this paper, the robust stability and convergence to the true state of moving horizon estimator based on an adaptive arrival cost are established for nonlinear detectable systems. Robust global asymptotic stability is shown for the case of non-vanishing bounded disturbances whereas the convergence to the true state is proved for the case of vanishing disturbances. Several simulations were made in order to show the estimator behaviour under different operational conditions and to compare it with the state of the art estimation methods.